Morphology-based language modeling for conversational Arabic speech recognition
نویسندگان
چکیده
Language modeling for large-vocabulary conversational Arabic speech recognition is faced with the problem of the complex morphology of Arabic, which increases the perplexity and out-of-vocabulary rate. This problem is compounded by the enormous dialectal variability and differences between spoken and written language. In this paper we investigate improvements in Arabic language modeling by developing various morphology-based language models. We present four different approaches to morphology-based language modeling, including a novel technique called factored language models. Experimental results are presented for both rescoring and first-pass recognition experiments.
منابع مشابه
Morphology-based language modeling for arabic speech recognition
Language modeling is a difficult problem for languages with rich morphology. In this paper we investigate the use of morphology-based language models at different stages in a speech recognition system for conversational Arabic. Classbased and single-stream factored language models using morphological word representations are applied within an N-best list rescoring framework. In addition, we exp...
متن کاملDevelopment of a conversational telephone speech recognizer for Levantine Arabic
Many languages, including Arabic, are characterized by a wide variety of different dialects that often differ strongly from each other. When developing speech technology for dialect-rich languages, the portability and reusability of data, algorithms, and system components becomes extremely important. In this paper, we describe the development of a large-vocabulary speech recognition system for ...
متن کاملOff-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model
In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...
متن کاملContext-aware Language Modeling for Conversational Speech Translation
Context plays a critical role in the understanding of language, especially conversational speech. However, few approaches exist to utilize the external contextual knowledge which is readily available to practical speech translation systems deployed in the field. In this work, we propose a novel framework to integrate context in the language models used for conversational speech translation. The...
متن کاملArabic Language Modeling with Stem-derived Morphemes for Automatic Speech Recognition
The goal of this dissertation is to introduce a method for deriving morphemes from Arabic words using stem patterns, a feature of Arabic morphology. The motivations are three-fold: modeling with morphemes rather than words should help address the out-ofvocabulary problem; working with stem patterns should prove to be a cross-dialectally valid method for deriving morphemes using a small amount o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computer Speech & Language
دوره 20 شماره
صفحات -
تاریخ انتشار 2006